SPdel notebook example file¶

Analyzing data using only nominal data extracted from sequence names on fasta¶

First, we will import the SPdel and all modules need. Note that you need to install before the SPdel. Then, we will use a fasta file with names in the format >genus_species_individual

In [1]:
import SPdel
import os
In [2]:
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta' 
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta)

############################################################################

SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets

############################################################################

Sequences are aligned (same size)
Fasta file with 116 sequences and 600 base pairs
In [3]:
nominal=SPdel.run_nominal(basepath,Inputs)
#####################
Nominal MOTUs
#####################

#####LS_brn#####
LS_brn_L930, LS_brn_L931, LS_brn_L932

#####LS_con#####
LS_con_L210, LS_con_L211, LS_con_L286, LS_con_L291, LS_con_L292, LS_con_L820

#####LS_elo#####
LS_elo_L1041, LS_elo_L1046, LS_elo_L1047, LS_elo_L287, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L309

#####LS_gar#####
LS_gar_L293, LS_gar_L294, LS_gar_L295, LS_gar_L296, LS_gar_L298

#####LS_mac#####
LS_mac_B061, LS_mac_B082, LS_mac_B086, LS_mac_B087, LS_mac_B088, LS_mac_B089, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L212, LS_mac_L225, LS_mac_L290, LS_mac_L890, LS_mac_L891

#####LS_muy#####
LS_muy_L907, LS_muy_L913, LS_muy_L914, LS_muy_L915

#####LS_obt#####
LS_obt_B031, LS_obt_B070, LS_obt_B071, LS_obt_B074, LS_obt_B075, LS_obt_B077, LS_obt_B090, LS_obt_B100, LS_obt_B101, LS_obt_B102, LS_obt_B103, LS_obt_L004, LS_obt_L007, LS_obt_L008, LS_obt_L009, LS_obt_L013, LS_obt_L016, LS_obt_L084, LS_obt_L253, LS_obt_L266, LS_obt_L267, LS_obt_L268, LS_obt_L269, LS_obt_L282, LS_obt_L283, LS_obt_L314, LS_obt_L315, LS_obt_L316, LS_obt_L320, LS_obt_L547, LS_obt_L548

#####LS_piv#####
LS_piv_B060, LS_piv_B076, LS_piv_B078, LS_piv_B093, LS_piv_B094, LS_piv_B095, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099, LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144, LS_piv_L002, LS_piv_L003, LS_piv_L005, LS_piv_L006, LS_piv_L010, LS_piv_L011, LS_piv_L014, LS_piv_L284, LS_piv_L371

#####LS_rei#####
LS_rei_B072, LS_rei_B073, LS_rei_L338, LS_rei_L339, LS_rei_L341, LS_rei_L342, LS_rei_L343, LS_rei_L353, LS_rei_L355, LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779

#####LS_tri#####
LS_tri_L179, LS_tri_L180, LS_tri_L182, LS_tri_L519, LS_tri_L618, LS_tri_L621, LS_tri_L690, LS_tri_L955


Using k2p distance

After calculating all genetic distances, we can obtain a summary table with the mean and maximum of intraspecific distance, the nearest deighbor (NN), and the distance to the NN (also know as minimum interspecific)

In [4]:
nominal.print_summary()
Out[4]:
Mean Max NN DtoNN
LS_brn 0.000000 0.00000 LS_obt 6.77516
LS_con 2.127067 3.98825 LS_obt 5.60360
LS_elo 0.037100 0.16695 LS_obt 2.73593
LS_gar 0.000000 0.00000 LS_obt 7.68126
LS_mac 0.903538 1.85854 LS_tri 4.51779
LS_muy 7.655915 15.31183 LS_tri 7.47916
LS_obt 1.937511 6.71724 LS_elo 2.73593
LS_piv 0.266286 1.00758 LS_obt 2.90372
LS_rei 0.316828 0.70177 LS_con 6.14484
LS_tri 3.618149 6.33176 LS_mac 4.51779
In [5]:
nominal.print_summary_all()
Out[5]:
minimum mean maximum
intra 0.00000 1.68624 15.31183
inter 2.73593 9.49266 15.15938

We can plot the maximum intraspecific vs the minimum interspecific of each taxa. The taxa with minimum interspecific higher than the maximum intraspecific are below the diagonal line.

In [6]:
nominal.plot_max_min()

Also, we can obtain a barcoding gap plot. In case of barcoding gap there will be not superposition among intraspecific and interspecific distances

In [7]:
nominal.plot_freq()

If you want to explore the data, you can plot a heatmap of pairwsise genetic distance. Note the scale!

In [8]:
nominal.plot_heatmap()

In some dataset the range of values are so wide that the difference in the lower values could be masked, so you can use a upper threshold on your scale. In this case the existence of divergente inidividuals in LS_obt is revealed.

In [9]:
nominal.plot_heatmap(upper=4)

Chaging the mutational model¶

SPdel calculates genetic distances using a Kimura 2-parameters (K2p) model as default, but also you can set a p-distance model.

In [10]:
nominal=SPdel.run_nominal(basepath,Inputs,dis='p')
#####################
Nominal MOTUs
#####################

#####LS_brn#####
LS_brn_L930, LS_brn_L931, LS_brn_L932

#####LS_con#####
LS_con_L210, LS_con_L211, LS_con_L286, LS_con_L291, LS_con_L292, LS_con_L820

#####LS_elo#####
LS_elo_L1041, LS_elo_L1046, LS_elo_L1047, LS_elo_L287, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L309

#####LS_gar#####
LS_gar_L293, LS_gar_L294, LS_gar_L295, LS_gar_L296, LS_gar_L298

#####LS_mac#####
LS_mac_B061, LS_mac_B082, LS_mac_B086, LS_mac_B087, LS_mac_B088, LS_mac_B089, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L212, LS_mac_L225, LS_mac_L290, LS_mac_L890, LS_mac_L891

#####LS_muy#####
LS_muy_L907, LS_muy_L913, LS_muy_L914, LS_muy_L915

#####LS_obt#####
LS_obt_B031, LS_obt_B070, LS_obt_B071, LS_obt_B074, LS_obt_B075, LS_obt_B077, LS_obt_B090, LS_obt_B100, LS_obt_B101, LS_obt_B102, LS_obt_B103, LS_obt_L004, LS_obt_L007, LS_obt_L008, LS_obt_L009, LS_obt_L013, LS_obt_L016, LS_obt_L084, LS_obt_L253, LS_obt_L266, LS_obt_L267, LS_obt_L268, LS_obt_L269, LS_obt_L282, LS_obt_L283, LS_obt_L314, LS_obt_L315, LS_obt_L316, LS_obt_L320, LS_obt_L547, LS_obt_L548

#####LS_piv#####
LS_piv_B060, LS_piv_B076, LS_piv_B078, LS_piv_B093, LS_piv_B094, LS_piv_B095, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099, LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144, LS_piv_L002, LS_piv_L003, LS_piv_L005, LS_piv_L006, LS_piv_L010, LS_piv_L011, LS_piv_L014, LS_piv_L284, LS_piv_L371

#####LS_rei#####
LS_rei_B072, LS_rei_B073, LS_rei_L338, LS_rei_L339, LS_rei_L341, LS_rei_L342, LS_rei_L343, LS_rei_L353, LS_rei_L355, LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779

#####LS_tri#####
LS_tri_L179, LS_tri_L180, LS_tri_L182, LS_tri_L519, LS_tri_L618, LS_tri_L621, LS_tri_L690, LS_tri_L955


Using p-distance

In [11]:
nominal.print_summary()
Out[11]:
Mean Max NN DtoNN
LS_brn 0.000000 0.00000 LS_obt 6.37066
LS_con 2.044443 3.83333 LS_obt 5.33333
LS_elo 0.037038 0.16667 LS_obt 2.66667
LS_gar 0.000000 0.00000 LS_obt 7.15503
LS_mac 0.891604 1.83333 LS_tri 4.33333
LS_muy 6.750000 13.50000 LS_tri 7.00000
LS_obt 1.872420 6.33333 LS_elo 2.66667
LS_piv 0.264893 1.00000 LS_obt 2.83333
LS_rei 0.314661 0.69686 LS_con 5.83333
LS_tri 3.428571 6.00000 LS_mac 4.33333

Checking for at least one ORF without stop codon¶

The most used DNA barcoding marker is COI, a codificant gene. So, its important for quality check to test if there is at leat one ORF without stop codons. In this case you can use CODE='VER' for mitonchondrial vertebrate or CODE='INV' for mithocondrial invertebrate o using the number code from NCBI https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c

In [12]:
Inputs=SPdel.reading_data(fasta,CODE='VER')
Inputs.stop_codon()

############################################################################

SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets

############################################################################

Sequences are aligned (same size)
Fasta file with 116 sequences and 600 base pairs
Checking stop codons using VER genetic code
C:\Users\JORGE\.conda\envs\spdel_git4\lib\site-packages\Bio\Seq.py:2808: BiopythonWarning:

Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.

All sequences have at least one ORF without stop codons
In [13]:
Inputs=SPdel.reading_data(fasta,CODE=2)
Inputs.stop_codon()

############################################################################

SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets

############################################################################

Sequences are aligned (same size)
Fasta file with 116 sequences and 600 base pairs
Checking stop codons using 2 genetic code
All sequences have at least one ORF without stop codons

Analyzing data using PTP species delimitation¶

With SPdel you can perform single-gene species delimitation methods. In this case we will calulate MOTUs using PTP method (Zhang et al., 2013) and obtain the same graphics that the nominal analysis! Note the difference in the graphics.

In [14]:
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
tree = './data/Megaleporinus/Megaleporinus_tree.nwk'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta,tree)

############################################################################

SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets

############################################################################

Sequences are aligned (same size)
Fasta file with 116 sequences and 600 base pairs
In [15]:
PTP=SPdel.run_PTP(basepath,Inputs)
generated new fontManager
Speciation rate: 43.602
Coalesecnt rate: 1666.347
Null logl: 959.787
MAX logl: 1249.664
P-value: 0.000
Kolmogorov-Smirnov test for model fitting:
Speciation: Dtest = 0.539 p-value >= 0.1 excellent model fitting
Coalescent: Dtest = 2.220 p-value < 0.01 poor model fitting
Number of species: 18


#####################
PTP MOTUs
#####################

#####MOTU_01#####
LS_muy_L913, LS_muy_L915, LS_muy_L914

#####MOTU_02#####
LS_muy_L907

#####MOTU_03#####
LS_brn_L930, LS_brn_L931, LS_brn_L932

#####MOTU_04#####
LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295

#####MOTU_05#####
LS_tri_L179, LS_tri_L180, LS_tri_L182

#####MOTU_06#####
LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292

#####MOTU_07#####
LS_con_L210, LS_con_L211

#####MOTU_08#####
LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621

#####MOTU_09#####
LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102

#####MOTU_10#####
LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047

#####MOTU_11#####
LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320

#####MOTU_12#####
LS_obt_L084

#####MOTU_13#####
LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088

#####MOTU_14#####
LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290

#####MOTU_15#####
LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779

#####MOTU_16#####
LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353

#####MOTU_17#####
LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144

#####MOTU_18#####
LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099


Using k2p distance

In [16]:
PTP.print_summary()
Out[16]:
Mean Max NN DtoNN
MOTU_01 0.000000 0.00000 MOTU_03 11.60407
MOTU_02 NaN NaN MOTU_08 7.47916
MOTU_03 0.000000 0.00000 MOTU_09 6.77516
MOTU_04 0.000000 0.00000 MOTU_09 7.68126
MOTU_05 0.000000 0.00000 MOTU_08 6.33176
MOTU_06 0.000000 0.00000 MOTU_07 3.98825
MOTU_07 0.000000 0.00000 MOTU_06 3.98825
MOTU_08 0.000000 0.00000 MOTU_14 4.51779
MOTU_09 0.143081 0.50167 MOTU_11 2.84291
MOTU_10 0.037100 0.16695 MOTU_11 2.73593
MOTU_11 0.000000 0.00000 MOTU_10 2.73593
MOTU_12 NaN NaN MOTU_17 2.90372
MOTU_13 0.136404 0.34101 MOTU_14 1.55005
MOTU_14 0.000000 0.00000 MOTU_13 1.55005
MOTU_15 0.000000 0.00000 MOTU_16 0.67115
MOTU_16 0.000000 0.00000 MOTU_15 0.67115
MOTU_17 0.083475 0.16695 MOTU_18 0.67024
MOTU_18 0.061102 0.16772 MOTU_17 0.67024
In [17]:
PTP.plot_max_min()
In [18]:
PTP.plot_freq() 
In [19]:
PTP.plot_heatmap()
In [20]:
PTP.plot_heatmap(upper=4)

Using precalculated PTP result¶

Also, you can calculate the PTP using the bPTP web server (https://species.h-its.org/) and use the output file.

In [21]:
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
PTP_out= './data/Megaleporinus/PTP.PTPhSupportPartition.txt'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta)

############################################################################

SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets

############################################################################

Sequences are aligned (same size)
Fasta file with 116 sequences and 600 base pairs
In [22]:
PTP=SPdel.run_PTPList(basepath,Inputs,PTP_out)
#####################
PTP MOTUs
#####################

#####MOTU_01#####
LS_muy_L913, LS_muy_L915, LS_muy_L914

#####MOTU_02#####
LS_muy_L907

#####MOTU_03#####
LS_brn_L930, LS_brn_L931, LS_brn_L932

#####MOTU_04#####
LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295

#####MOTU_05#####
LS_tri_L179, LS_tri_L180, LS_tri_L182

#####MOTU_06#####
LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292

#####MOTU_07#####
LS_con_L210, LS_con_L211

#####MOTU_08#####
LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621

#####MOTU_09#####
LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102

#####MOTU_10#####
LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047

#####MOTU_11#####
LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320

#####MOTU_12#####
LS_obt_L084

#####MOTU_13#####
LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088

#####MOTU_14#####
LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290

#####MOTU_15#####
LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779

#####MOTU_16#####
LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353

#####MOTU_17#####
LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144

#####MOTU_18#####
LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099


Using k2p distance

In [23]:
PTP.print_summary()
Out[23]:
Mean Max NN DtoNN
MOTU_01 0.000000 0.00000 MOTU_03 11.60407
MOTU_02 NaN NaN MOTU_08 7.47916
MOTU_03 0.000000 0.00000 MOTU_09 6.77516
MOTU_04 0.000000 0.00000 MOTU_09 7.68126
MOTU_05 0.000000 0.00000 MOTU_08 6.33176
MOTU_06 0.000000 0.00000 MOTU_07 3.98825
MOTU_07 0.000000 0.00000 MOTU_06 3.98825
MOTU_08 0.000000 0.00000 MOTU_14 4.51779
MOTU_09 0.143081 0.50167 MOTU_11 2.84291
MOTU_10 0.037100 0.16695 MOTU_11 2.73593
MOTU_11 0.000000 0.00000 MOTU_10 2.73593
MOTU_12 NaN NaN MOTU_17 2.90372
MOTU_13 0.136404 0.34101 MOTU_14 1.55005
MOTU_14 0.000000 0.00000 MOTU_13 1.55005
MOTU_15 0.000000 0.00000 MOTU_16 0.67115
MOTU_16 0.000000 0.00000 MOTU_15 0.67115
MOTU_17 0.083475 0.16695 MOTU_18 0.67024
MOTU_18 0.061102 0.16772 MOTU_17 0.67024
In [24]:
PTP.plot_max_min()
In [25]:
PTP.plot_freq() 
In [26]:
PTP.plot_heatmap()

Analyzing data using bPTP species delimitation¶

In this case we will calulate MOTUs using bPTP species delimitaiton method (Zhang et al., 2013) and obtain the same graphics that the nominal analysis! Note the difference in the graphics.

In [27]:
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
tree = './data/Megaleporinus/Megaleporinus_tree.nwk'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta,tree)

############################################################################

SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets

############################################################################

Sequences are aligned (same size)
Fasta file with 116 sequences and 600 base pairs
In [28]:
bPTP=SPdel.run_bPTP(basepath,Inputs)
Estimated number of species is between 17 and 21
Mean: 18.56

bPTP finished running with the following parameters:
 MCMC iterations:................10000
 MCMC sampling interval:.........100
 MCMC burn-in:...................0.10
 MCMC seed:......................1234


#####################
bPTP MOTUs
#####################

#####MOTU_01#####
LS_muy_L913, LS_muy_L915, LS_muy_L914

#####MOTU_02#####
LS_muy_L907

#####MOTU_03#####
LS_brn_L930, LS_brn_L931, LS_brn_L932

#####MOTU_04#####
LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295

#####MOTU_05#####
LS_tri_L179, LS_tri_L180, LS_tri_L182

#####MOTU_06#####
LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292

#####MOTU_07#####
LS_con_L210, LS_con_L211

#####MOTU_08#####
LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621

#####MOTU_09#####
LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102

#####MOTU_10#####
LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047

#####MOTU_11#####
LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320

#####MOTU_12#####
LS_obt_L084

#####MOTU_13#####
LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088

#####MOTU_14#####
LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290

#####MOTU_15#####
LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779

#####MOTU_16#####
LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353

#####MOTU_17#####
LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144

#####MOTU_18#####
LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099


Using k2p distance

<Figure size 640x480 with 0 Axes>
In [29]:
bPTP.print_summary()
Out[29]:
Mean Max NN DtoNN
MOTU_01 0.000000 0.00000 MOTU_03 11.60407
MOTU_02 NaN NaN MOTU_08 7.47916
MOTU_03 0.000000 0.00000 MOTU_09 6.77516
MOTU_04 0.000000 0.00000 MOTU_09 7.68126
MOTU_05 0.000000 0.00000 MOTU_08 6.33176
MOTU_06 0.000000 0.00000 MOTU_07 3.98825
MOTU_07 0.000000 0.00000 MOTU_06 3.98825
MOTU_08 0.000000 0.00000 MOTU_14 4.51779
MOTU_09 0.143081 0.50167 MOTU_11 2.84291
MOTU_10 0.037100 0.16695 MOTU_11 2.73593
MOTU_11 0.000000 0.00000 MOTU_10 2.73593
MOTU_12 NaN NaN MOTU_17 2.90372
MOTU_13 0.136404 0.34101 MOTU_14 1.55005
MOTU_14 0.000000 0.00000 MOTU_13 1.55005
MOTU_15 0.000000 0.00000 MOTU_16 0.67115
MOTU_16 0.000000 0.00000 MOTU_15 0.67115
MOTU_17 0.083475 0.16695 MOTU_18 0.67024
MOTU_18 0.061102 0.16772 MOTU_17 0.67024
In [30]:
bPTP.plot_max_min()
In [31]:
bPTP.plot_freq()
In [32]:
bPTP.plot_heatmap(upper=4)

As bPTP is a bayesian implementation, sometime you would need more iteration. You can change the default niter='10000', sample='100', burnin='0.1' as your convenience.

In [33]:
niter='100000'
sample='1000'
burnin='0.1'
bPTP=SPdel.run_bPTP(basepath,Inputs,niter,sample,burnin)
Estimated number of species is between 17 and 21
Mean: 18.29

bPTP finished running with the following parameters:
 MCMC iterations:................100000
 MCMC sampling interval:.........1000
 MCMC burn-in:...................0.10
 MCMC seed:......................1234


#####################
bPTP MOTUs
#####################

#####MOTU_01#####
LS_muy_L913, LS_muy_L915, LS_muy_L914

#####MOTU_02#####
LS_muy_L907

#####MOTU_03#####
LS_brn_L930, LS_brn_L931, LS_brn_L932

#####MOTU_04#####
LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295

#####MOTU_05#####
LS_tri_L179, LS_tri_L180, LS_tri_L182

#####MOTU_06#####
LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292

#####MOTU_07#####
LS_con_L210, LS_con_L211

#####MOTU_08#####
LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621

#####MOTU_09#####
LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102

#####MOTU_10#####
LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047

#####MOTU_11#####
LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320

#####MOTU_12#####
LS_obt_L084

#####MOTU_13#####
LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088

#####MOTU_14#####
LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290

#####MOTU_15#####
LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779

#####MOTU_16#####
LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353

#####MOTU_17#####
LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144

#####MOTU_18#####
LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099


Using k2p distance

<Figure size 640x480 with 0 Axes>

Using precalculated bPTP result¶

Also, you can calculate the bPTP using the bPTP web server (https://species.h-its.org/) and use the output file.

In [34]:
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
bPTP_out= './data/Megaleporinus/bPTP.PTPhSupportPartition.txt'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta)

############################################################################

SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets

############################################################################

Sequences are aligned (same size)
Fasta file with 116 sequences and 600 base pairs
In [35]:
bPTP=SPdel.run_bPTPList(basepath,Inputs,bPTP_out)
#####################
bPTP MOTUs
#####################

#####MOTU_01#####
LS_muy_L913, LS_muy_L915, LS_muy_L914

#####MOTU_02#####
LS_muy_L907

#####MOTU_03#####
LS_brn_L930, LS_brn_L931, LS_brn_L932

#####MOTU_04#####
LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295

#####MOTU_05#####
LS_tri_L179, LS_tri_L180, LS_tri_L182

#####MOTU_06#####
LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292

#####MOTU_07#####
LS_con_L210, LS_con_L211

#####MOTU_08#####
LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621

#####MOTU_09#####
LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102

#####MOTU_10#####
LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047

#####MOTU_11#####
LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320

#####MOTU_12#####
LS_obt_L084

#####MOTU_13#####
LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088

#####MOTU_14#####
LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290

#####MOTU_15#####
LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779

#####MOTU_16#####
LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353

#####MOTU_17#####
LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144

#####MOTU_18#####
LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099


Using k2p distance

In [36]:
bPTP.print_summary()
Out[36]:
Mean Max NN DtoNN
MOTU_01 0.000000 0.00000 MOTU_03 11.60407
MOTU_02 NaN NaN MOTU_08 7.47916
MOTU_03 0.000000 0.00000 MOTU_09 6.77516
MOTU_04 0.000000 0.00000 MOTU_09 7.68126
MOTU_05 0.000000 0.00000 MOTU_08 6.33176
MOTU_06 0.000000 0.00000 MOTU_07 3.98825
MOTU_07 0.000000 0.00000 MOTU_06 3.98825
MOTU_08 0.000000 0.00000 MOTU_14 4.51779
MOTU_09 0.143081 0.50167 MOTU_11 2.84291
MOTU_10 0.037100 0.16695 MOTU_11 2.73593
MOTU_11 0.000000 0.00000 MOTU_10 2.73593
MOTU_12 NaN NaN MOTU_17 2.90372
MOTU_13 0.136404 0.34101 MOTU_14 1.55005
MOTU_14 0.000000 0.00000 MOTU_13 1.55005
MOTU_15 0.000000 0.00000 MOTU_16 0.67115
MOTU_16 0.000000 0.00000 MOTU_15 0.67115
MOTU_17 0.083475 0.16695 MOTU_18 0.67024
MOTU_18 0.061102 0.16772 MOTU_17 0.67024
In [37]:
bPTP.plot_max_min()
In [38]:
bPTP.plot_freq() 
In [39]:
bPTP.plot_heatmap(upper=4)

Analyzing data using GMYC species delimitation¶

Another method included in SPdel is the GMYC (Pons, et al. 2006). Of course, you can obtain all same metrics and figures as previous methods

In [40]:
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
tree = './data/Megaleporinus/Megaleporinus_tree.nwk'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta,tree)

############################################################################

SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets

############################################################################

Sequences are aligned (same size)
Fasta file with 116 sequences and 600 base pairs
In [41]:
GMYC=SPdel.run_GMYC(basepath,Inputs)
Highest llh:1004.3909296154683
Num spe:18
Null llh:968.4341056152681
P-value:2.220446049250313e-16
Final number of estimated species by GMYC: 18

#####################
GMYC MOTUs
#####################

#####MOTU_01#####
LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102

#####MOTU_02#####
LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099

#####MOTU_03#####
LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290

#####MOTU_04#####
LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320

#####MOTU_05#####
LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088

#####MOTU_06#####
LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047

#####MOTU_07#####
LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353

#####MOTU_08#####
LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621

#####MOTU_09#####
LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295

#####MOTU_10#####
LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144

#####MOTU_11#####
LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292

#####MOTU_12#####
LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779

#####MOTU_13#####
LS_tri_L179, LS_tri_L180, LS_tri_L182

#####MOTU_14#####
LS_muy_L913, LS_muy_L915, LS_muy_L914

#####MOTU_15#####
LS_brn_L930, LS_brn_L931, LS_brn_L932

#####MOTU_16#####
LS_con_L210, LS_con_L211

#####MOTU_17#####
LS_obt_L084

#####MOTU_18#####
LS_muy_L907


Using k2p distance

In [42]:
GMYC.print_summary()
Out[42]:
Mean Max NN DtoNN
MOTU_01 0.143081 0.50167 MOTU_04 2.84291
MOTU_02 0.058574 0.16772 MOTU_10 0.67024
MOTU_03 0.000000 0.00000 MOTU_05 1.55005
MOTU_04 0.000000 0.00000 MOTU_06 2.73593
MOTU_05 0.136404 0.34101 MOTU_03 1.55005
MOTU_06 0.037100 0.16695 MOTU_04 2.73593
MOTU_07 0.000000 0.00000 MOTU_12 0.67115
MOTU_08 0.000000 0.00000 MOTU_03 4.51779
MOTU_09 0.000000 0.00000 MOTU_01 7.68126
MOTU_10 0.083475 0.16695 MOTU_02 0.67024
MOTU_11 0.000000 0.00000 MOTU_16 3.98825
MOTU_12 0.000000 0.00000 MOTU_07 0.67115
MOTU_13 0.000000 0.00000 MOTU_08 6.33176
MOTU_14 0.000000 0.00000 MOTU_15 11.60407
MOTU_15 0.000000 0.00000 MOTU_01 6.77516
MOTU_16 0.000000 0.00000 MOTU_11 3.98825
MOTU_17 NaN NaN MOTU_10 2.90372
MOTU_18 NaN NaN NaN
In [43]:
GMYC.plot_max_min()
In [44]:
GMYC.plot_freq() 
In [45]:
GMYC.plot_heatmap(upper=4)

Using precalculated GMYC result¶

In [46]:
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
GMYC_out= './data/Megaleporinus/GMYC_MOTU.txt'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta)

############################################################################

SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets

############################################################################

Sequences are aligned (same size)
Fasta file with 116 sequences and 600 base pairs
In [47]:
GMYC=SPdel.run_GMYCList(basepath,Inputs,GMYC_out)
#####################
GMYC MOTUs
#####################

#####MOTU_01#####
LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102

#####MOTU_02#####
LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099

#####MOTU_03#####
LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290

#####MOTU_04#####
LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320

#####MOTU_05#####
LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088

#####MOTU_06#####
LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047

#####MOTU_07#####
LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353

#####MOTU_08#####
LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621

#####MOTU_09#####
LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295

#####MOTU_10#####
LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144

#####MOTU_11#####
LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292

#####MOTU_12#####
LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779

#####MOTU_13#####
LS_tri_L179, LS_tri_L180, LS_tri_L182

#####MOTU_14#####
LS_muy_L913, LS_muy_L915, LS_muy_L914

#####MOTU_15#####
LS_brn_L930, LS_brn_L931, LS_brn_L932

#####MOTU_16#####
LS_con_L210, LS_con_L211

#####MOTU_17#####
LS_obt_L084

#####MOTU_18#####
LS_muy_L907


Using k2p distance

In [48]:
GMYC.print_summary()
Out[48]:
Mean Max NN DtoNN
MOTU_01 0.143081 0.50167 MOTU_04 2.84291
MOTU_02 0.058574 0.16772 MOTU_10 0.67024
MOTU_03 0.000000 0.00000 MOTU_05 1.55005
MOTU_04 0.000000 0.00000 MOTU_06 2.73593
MOTU_05 0.136404 0.34101 MOTU_03 1.55005
MOTU_06 0.037100 0.16695 MOTU_04 2.73593
MOTU_07 0.000000 0.00000 MOTU_12 0.67115
MOTU_08 0.000000 0.00000 MOTU_03 4.51779
MOTU_09 0.000000 0.00000 MOTU_01 7.68126
MOTU_10 0.083475 0.16695 MOTU_02 0.67024
MOTU_11 0.000000 0.00000 MOTU_16 3.98825
MOTU_12 0.000000 0.00000 MOTU_07 0.67115
MOTU_13 0.000000 0.00000 MOTU_08 6.33176
MOTU_14 0.000000 0.00000 MOTU_15 11.60407
MOTU_15 0.000000 0.00000 MOTU_01 6.77516
MOTU_16 0.000000 0.00000 MOTU_11 3.98825
MOTU_17 NaN NaN MOTU_10 2.90372
MOTU_18 NaN NaN NaN
In [49]:
GMYC.plot_max_min()
In [50]:
GMYC.plot_freq() 
In [51]:
GMYC.plot_heatmap()

Analyzing data using any species delimitation precalculate from a csv file¶

Here you can provide a CSV file with several delimitation results. Note that CSV need to have a head row with the analyses names, that should be used then for print tables and figures.

In [52]:
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
csv_file= './data/Megaleporinus/BIN_list.csv'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta)

############################################################################

SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets

############################################################################

Sequences are aligned (same size)
Fasta file with 116 sequences and 600 base pairs
In [53]:
csv_motus=SPdel.run_csvList(basepath,Inputs,csv_file)
#####################
BIN MOTUs
#####################

#####MOTU_AAB8569#####
LS_piv_B060, LS_piv_B076, LS_piv_B078, LS_piv_B093, LS_piv_B094, LS_piv_B095, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099, LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144, LS_piv_L002, LS_piv_L003, LS_piv_L005, LS_piv_L006, LS_piv_L010, LS_piv_L011, LS_piv_L014, LS_piv_L284, LS_piv_L371

#####MOTU_AAB8578#####
LS_obt_B074, LS_obt_B075, LS_obt_B077, LS_obt_B100, LS_obt_B101, LS_obt_B102, LS_obt_B103, LS_obt_L004, LS_obt_L007, LS_obt_L008, LS_obt_L009, LS_obt_L013, LS_obt_L016, LS_obt_L282, LS_obt_L283, LS_obt_L547, LS_obt_L548

#####MOTU_AAD1729#####
LS_rei_B072, LS_rei_B073, LS_rei_L338, LS_rei_L339, LS_rei_L341, LS_rei_L342, LS_rei_L343, LS_rei_L353, LS_rei_L355, LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779

#####MOTU_AAE5328#####
LS_mac_B082, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L212, LS_mac_L225, LS_mac_L290, LS_mac_L890, LS_mac_L891

#####MOTU_ABY2894#####
LS_elo_L1041, LS_elo_L1046, LS_elo_L1047, LS_elo_L287, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L309

#####MOTU_ABZ0928#####
LS_obt_B031, LS_obt_B070, LS_obt_B071, LS_obt_B090, LS_obt_L253, LS_obt_L266, LS_obt_L267, LS_obt_L268, LS_obt_L269, LS_obt_L314, LS_obt_L315, LS_obt_L316, LS_obt_L320

#####MOTU_ACL3073#####
LS_tri_L519, LS_tri_L618, LS_tri_L621, LS_tri_L690, LS_tri_L955

#####MOTU_ACL3074#####
LS_tri_L179, LS_tri_L180, LS_tri_L182

#####MOTU_ACL3227#####
LS_gar_L293, LS_gar_L294, LS_gar_L295, LS_gar_L296, LS_gar_L298

#####MOTU_ACL3731#####
LS_con_L210, LS_con_L211

#####MOTU_ACL3942#####
LS_obt_L084

#####MOTU_ACL4264#####
LS_con_L286, LS_con_L291, LS_con_L292, LS_con_L820

#####MOTU_ACO1303#####
LS_mac_B061, LS_mac_B086, LS_mac_B087, LS_mac_B088, LS_mac_B089

#####MOTU_ADB0463#####
LS_brn_L930, LS_brn_L931, LS_brn_L932

#####MOTU_ADB0512#####
LS_muy_L907

#####MOTU_ADB0701#####
LS_muy_L913, LS_muy_L914, LS_muy_L915


Using k2p distance

Note here the use of the analysis name ('BIN') to call the function

In [54]:
csv_motus['BIN'].print_summary()
Out[54]:
Mean Max NN DtoNN
MOTU_AAB8569 0.266286 1.00758 MOTU_ACL3942 2.90372
MOTU_AAB8578 0.143081 0.50167 MOTU_ABZ0928 2.84291
MOTU_AAD1729 0.316828 0.70177 MOTU_ACL4264 6.14484
MOTU_AAE5328 0.000000 0.00000 MOTU_ACO1303 1.55005
MOTU_ABY2894 0.037100 0.16695 MOTU_ABZ0928 2.73593
MOTU_ABZ0928 0.000000 0.00000 MOTU_ABY2894 2.73593
MOTU_ACL3073 0.000000 0.00000 MOTU_AAE5328 4.51779
MOTU_ACL3074 0.000000 0.00000 MOTU_ACL3073 6.33176
MOTU_ACL3227 0.000000 0.00000 MOTU_AAB8578 7.68126
MOTU_ACL3731 0.000000 0.00000 MOTU_ACL4264 3.98825
MOTU_ACL3942 NaN NaN MOTU_AAB8569 2.90372
MOTU_ACL4264 0.000000 0.00000 MOTU_ACL3731 3.98825
MOTU_ACO1303 0.136404 0.34101 MOTU_AAE5328 1.55005
MOTU_ADB0463 0.000000 0.00000 MOTU_AAB8578 6.77516
MOTU_ADB0512 NaN NaN MOTU_ACL3073 7.47916
MOTU_ADB0701 0.000000 0.00000 MOTU_ADB0463 11.60407
In [55]:
csv_motus['BIN'].plot_max_min()
In [56]:
csv_motus['BIN'].plot_freq() 
In [57]:
csv_motus['BIN'].plot_heatmap(upper=4)

Comparing the different species delimitation results¶

One of the main function of SPdel is to compare the diferent delimitation methods and produce Consensus MOTUS. We encourage researcher to include different methods, with different philosophies to strengthen the analysis. Also, your can obtain the same graphics as the other methods and aditionally a tree figure including all methods compared.

In [58]:
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
tree = './data/Megaleporinus/Megaleporinus_tree.nwk'
basepath=os.path.dirname(fasta)
Compare_list = 'All'
Inputs=SPdel.reading_data(fasta,tree)

############################################################################

SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets

############################################################################

Sequences are aligned (same size)
Fasta file with 116 sequences and 600 base pairs

Note that whe Consensus MOTU are classified according their congruence between methods and the nominal taxonomy

In [59]:
compared=SPdel.run_comparison(basepath,Inputs,Compare_list)
#####################
 Consensus MOTUs
#####################

### All MOTUs match with the taxonomy ###

Consensus MOTU 1 [LS_brn_(Nominal)&MOTU_ADB0463_(BIN)&MOTU_03_(bPTP)&MOTU_15_(GMYC)&MOTU_03_(PTP)]
LS_brn_L930, LS_brn_L931, LS_brn_L932

Consensus MOTU 2 [LS_elo_(Nominal)&MOTU_ABY2894_(BIN)&MOTU_10_(bPTP)&MOTU_06_(GMYC)&MOTU_10_(PTP)]
LS_elo_L1041, LS_elo_L1046, LS_elo_L1047, LS_elo_L287, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L309

Consensus MOTU 3 [LS_gar_(Nominal)&MOTU_ACL3227_(BIN)&MOTU_04_(bPTP)&MOTU_09_(GMYC)&MOTU_04_(PTP)]
LS_gar_L293, LS_gar_L294, LS_gar_L295, LS_gar_L296, LS_gar_L298

### Most MOTUs agree with the taxonomy ###

### All MOTUs do not match the taxonomy###

Consensus MOTU 4 [MOTU_AAB8578_(BIN)&MOTU_09_(bPTP)&MOTU_01_(GMYC)&MOTU_09_(PTP)]
LS_obt_B074, LS_obt_B075, LS_obt_B077, LS_obt_B100, LS_obt_B101, LS_obt_B102, LS_obt_B103, LS_obt_L004, LS_obt_L007, LS_obt_L008, LS_obt_L009, LS_obt_L013, LS_obt_L016, LS_obt_L282, LS_obt_L283, LS_obt_L547, LS_obt_L548

Consensus MOTU 5 [MOTU_AAE5328_(BIN)&MOTU_14_(bPTP)&MOTU_03_(GMYC)&MOTU_14_(PTP)]
LS_mac_B082, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L212, LS_mac_L225, LS_mac_L290, LS_mac_L890, LS_mac_L891

Consensus MOTU 6 [MOTU_ABZ0928_(BIN)&MOTU_11_(bPTP)&MOTU_04_(GMYC)&MOTU_11_(PTP)]
LS_obt_B031, LS_obt_B070, LS_obt_B071, LS_obt_B090, LS_obt_L253, LS_obt_L266, LS_obt_L267, LS_obt_L268, LS_obt_L269, LS_obt_L314, LS_obt_L315, LS_obt_L316, LS_obt_L320

Consensus MOTU 7 [MOTU_ACL3073_(BIN)&MOTU_08_(bPTP)&MOTU_08_(GMYC)&MOTU_08_(PTP)]
LS_tri_L519, LS_tri_L618, LS_tri_L621, LS_tri_L690, LS_tri_L955

Consensus MOTU 8 [MOTU_ACL3074_(BIN)&MOTU_05_(bPTP)&MOTU_13_(GMYC)&MOTU_05_(PTP)]
LS_tri_L179, LS_tri_L180, LS_tri_L182

Consensus MOTU 9 [MOTU_ACL3731_(BIN)&MOTU_07_(bPTP)&MOTU_16_(GMYC)&MOTU_07_(PTP)]
LS_con_L210, LS_con_L211

Consensus MOTU 10 [MOTU_ACL3942_(BIN)&MOTU_12_(bPTP)&MOTU_17_(GMYC)&MOTU_12_(PTP)]
LS_obt_L084

Consensus MOTU 11 [MOTU_ACL4264_(BIN)&MOTU_06_(bPTP)&MOTU_11_(GMYC)&MOTU_06_(PTP)]
LS_con_L286, LS_con_L291, LS_con_L292, LS_con_L820

Consensus MOTU 12 [MOTU_ACO1303_(BIN)&MOTU_13_(bPTP)&MOTU_05_(GMYC)&MOTU_13_(PTP)]
LS_mac_B061, LS_mac_B086, LS_mac_B087, LS_mac_B088, LS_mac_B089

Consensus MOTU 13 [MOTU_ADB0512_(BIN)&MOTU_02_(bPTP)&MOTU_18_(GMYC)&MOTU_02_(PTP)]
LS_muy_L907

Consensus MOTU 14 [MOTU_ADB0701_(BIN)&MOTU_01_(bPTP)&MOTU_14_(GMYC)&MOTU_01_(PTP)]
LS_muy_L913, LS_muy_L914, LS_muy_L915

### Most MOTUs do not match the taxonomy ###

Consensus MOTU 15 [MOTU_15_(bPTP)&MOTU_12_(GMYC)&MOTU_15_(PTP)]
LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779

Consensus MOTU 16 [MOTU_16_(bPTP)&MOTU_07_(GMYC)&MOTU_16_(PTP)]
LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353

Consensus MOTU 17 [MOTU_17_(bPTP)&MOTU_10_(GMYC)&MOTU_17_(PTP)]
LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144

Consensus MOTU 18 [MOTU_18_(bPTP)&MOTU_02_(GMYC)&MOTU_18_(PTP)]
LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099

Using k2p distance

In [60]:
SPdel.plot_compare_tree(basepath, Inputs.tree, compared[0])
#####
Warning: Nominal species not contigous in the tree.
#####

LS_obt_L315LS_obt_L314LS_obt_L320LS_obt_L253LS_obt_L269LS_obt_L267LS_obt_B071LS_obt_L316LS_obt_L266LS_obt_L268LS_obt_B070LS_obt_B090LS_obt_B031LS_elo_L305LS_elo_L300LS_elo_L1046LS_elo_L309LS_elo_L287LS_elo_L1041LS_elo_L308LS_elo_L307LS_elo_L1047LS_obt_L013LS_obt_L004LS_obt_L283LS_obt_L548LS_obt_L009LS_obt_B075LS_obt_L016LS_obt_B074LS_obt_B077LS_obt_L547LS_obt_L282LS_obt_L008LS_obt_L007LS_obt_B102LS_obt_B101LS_obt_B103LS_obt_B100LS_piv_L014LS_piv_L006LS_piv_L005LS_piv_B078LS_piv_L284LS_piv_B094LS_piv_B076LS_piv_L371LS_piv_L011LS_piv_L003LS_piv_L002LS_piv_L010LS_piv_B095LS_piv_B060LS_piv_B093LS_piv_B098LS_piv_B097LS_piv_B096LS_piv_B099LS_piv_B142LS_piv_B141LS_piv_B144LS_piv_B140LS_obt_L084LS_gar_L296LS_gar_L294LS_gar_L295LS_gar_L298LS_gar_L293LS_brn_L932LS_brn_L931LS_brn_L930LS_con_L292LS_con_L291LS_con_L820LS_con_L286LS_con_L211LS_con_L210LS_rei_L355LS_rei_L338LS_rei_L341LS_rei_B073LS_rei_L342LS_rei_B072LS_rei_L353LS_rei_L343LS_rei_L339LS_rei_L779LS_rei_L778LS_rei_L777LS_rei_L773LS_muy_L915LS_muy_L913LS_muy_L914LS_mac_L891LS_mac_L212LS_mac_L225LS_mac_L178LS_mac_L085LS_mac_L083LS_mac_L890LS_mac_B082LS_mac_L290LS_mac_B088LS_mac_B087LS_mac_B089LS_mac_B086LS_mac_B061LS_tri_L955LS_tri_L618LS_tri_L690LS_tri_L519LS_tri_L621LS_tri_L182LS_tri_L180LS_tri_L179LS_muy_L907NominalBINbPTPGMYCPTPConsensusNominalBINbPTPGMYCPTPConsensusNominalBINbPTPGMYCPTPConsensusNominalBINbPTPGMYCPTPConsensusNominalBINbPTPGMYCPTPConsensusNominalBINbPTPGMYCPTPConsensus0235791012

Saving tree plot as svg

In [61]:
SPdel.plot_compare_tree(basepath, Inputs.tree, compared[0],save=True)
#####
Warning: Nominal species not contigous in the tree.
#####

In [62]:
compared[1].print_summary()
Out[62]:
Mean Max NN DtoNN
MOTU_1 0.000000 0.00000 MOTU_4 6.77516
MOTU_10 NaN NaN MOTU_17 2.90372
MOTU_11 0.000000 0.00000 MOTU_9 3.98825
MOTU_12 0.136404 0.34101 MOTU_5 1.55005
MOTU_13 NaN NaN MOTU_7 7.47916
MOTU_14 0.000000 0.00000 MOTU_1 11.60407
MOTU_15 0.000000 0.00000 MOTU_16 0.67115
MOTU_16 0.000000 0.00000 MOTU_15 0.67115
MOTU_17 0.083475 0.16695 MOTU_18 0.67024
MOTU_18 0.058574 0.16772 MOTU_17 0.67024
MOTU_2 0.037100 0.16695 MOTU_6 2.73593
MOTU_3 0.000000 0.00000 MOTU_4 7.68126
MOTU_4 0.143081 0.50167 MOTU_6 2.84291
MOTU_5 0.000000 0.00000 MOTU_12 1.55005
MOTU_6 0.000000 0.00000 MOTU_2 2.73593
MOTU_7 0.000000 0.00000 MOTU_5 4.51779
MOTU_8 0.000000 0.00000 MOTU_7 6.33176
MOTU_9 0.000000 0.00000 MOTU_11 3.98825
In [63]:
compared[1].plot_max_min()
In [64]:
compared[1].plot_freq() 
In [65]:
compared[1].plot_heatmap() 
In [66]:
compared[1].plot_heatmap(upper=4) 

Specifying only some analysis to include in comparision¶

Note that in the previous example we use 'All' to indicate all methods previously calculated (and with results in the working folder), however we can indicate the analyses to be included in the comparision

In [67]:
Compare_list = 'Nominal,BIN,bPTP,PTP'
compared=SPdel.run_comparison(basepath,Inputs,Compare_list)
#####################
 Consensus MOTUs
#####################

### All MOTUs match with the taxonomy ###

Consensus MOTU 1 [LS_brn_(Nominal)&MOTU_ADB0463_(BIN)&MOTU_03_(bPTP)&MOTU_03_(PTP)]
LS_brn_L930, LS_brn_L931, LS_brn_L932

Consensus MOTU 2 [LS_elo_(Nominal)&MOTU_ABY2894_(BIN)&MOTU_10_(bPTP)&MOTU_10_(PTP)]
LS_elo_L1041, LS_elo_L1046, LS_elo_L1047, LS_elo_L287, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L309

Consensus MOTU 3 [LS_gar_(Nominal)&MOTU_ACL3227_(BIN)&MOTU_04_(bPTP)&MOTU_04_(PTP)]
LS_gar_L293, LS_gar_L294, LS_gar_L295, LS_gar_L296, LS_gar_L298

### Most MOTUs agree with the taxonomy ###

### All MOTUs do not match the taxonomy###

Consensus MOTU 4 [MOTU_AAB8578_(BIN)&MOTU_09_(bPTP)&MOTU_09_(PTP)]
LS_obt_B074, LS_obt_B075, LS_obt_B077, LS_obt_B100, LS_obt_B101, LS_obt_B102, LS_obt_B103, LS_obt_L004, LS_obt_L007, LS_obt_L008, LS_obt_L009, LS_obt_L013, LS_obt_L016, LS_obt_L282, LS_obt_L283, LS_obt_L547, LS_obt_L548

Consensus MOTU 5 [MOTU_AAE5328_(BIN)&MOTU_14_(bPTP)&MOTU_14_(PTP)]
LS_mac_B082, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L212, LS_mac_L225, LS_mac_L290, LS_mac_L890, LS_mac_L891

Consensus MOTU 6 [MOTU_ABZ0928_(BIN)&MOTU_11_(bPTP)&MOTU_11_(PTP)]
LS_obt_B031, LS_obt_B070, LS_obt_B071, LS_obt_B090, LS_obt_L253, LS_obt_L266, LS_obt_L267, LS_obt_L268, LS_obt_L269, LS_obt_L314, LS_obt_L315, LS_obt_L316, LS_obt_L320

Consensus MOTU 7 [MOTU_ACL3073_(BIN)&MOTU_08_(bPTP)&MOTU_08_(PTP)]
LS_tri_L519, LS_tri_L618, LS_tri_L621, LS_tri_L690, LS_tri_L955

Consensus MOTU 8 [MOTU_ACL3074_(BIN)&MOTU_05_(bPTP)&MOTU_05_(PTP)]
LS_tri_L179, LS_tri_L180, LS_tri_L182

Consensus MOTU 9 [MOTU_ACL3731_(BIN)&MOTU_07_(bPTP)&MOTU_07_(PTP)]
LS_con_L210, LS_con_L211

Consensus MOTU 10 [MOTU_ACL3942_(BIN)&MOTU_12_(bPTP)&MOTU_12_(PTP)]
LS_obt_L084

Consensus MOTU 11 [MOTU_ACL4264_(BIN)&MOTU_06_(bPTP)&MOTU_06_(PTP)]
LS_con_L286, LS_con_L291, LS_con_L292, LS_con_L820

Consensus MOTU 12 [MOTU_ACO1303_(BIN)&MOTU_13_(bPTP)&MOTU_13_(PTP)]
LS_mac_B061, LS_mac_B086, LS_mac_B087, LS_mac_B088, LS_mac_B089

Consensus MOTU 13 [MOTU_ADB0512_(BIN)&MOTU_02_(bPTP)&MOTU_02_(PTP)]
LS_muy_L907

Consensus MOTU 14 [MOTU_ADB0701_(BIN)&MOTU_01_(bPTP)&MOTU_01_(PTP)]
LS_muy_L913, LS_muy_L914, LS_muy_L915

### Most MOTUs do not match the taxonomy ###

Consensus MOTU 15 [MOTU_15_(bPTP)&MOTU_15_(PTP)]
LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779

Consensus MOTU 16 [MOTU_16_(bPTP)&MOTU_16_(PTP)]
LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353

Consensus MOTU 17 [MOTU_17_(bPTP)&MOTU_17_(PTP)]
LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144

Consensus MOTU 18 [MOTU_18_(bPTP)&MOTU_18_(PTP)]
LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099

Using k2p distance

Don't plotting the consensus bar

In [68]:
SPdel.plot_compare_tree(basepath, Inputs.tree, compared[0],nocons=True)
#####
Warning: Nominal species not contigous in the tree.
#####

LS_obt_L315LS_obt_L314LS_obt_L320LS_obt_L253LS_obt_L269LS_obt_L267LS_obt_B071LS_obt_L316LS_obt_L266LS_obt_L268LS_obt_B070LS_obt_B090LS_obt_B031LS_elo_L305LS_elo_L300LS_elo_L1046LS_elo_L309LS_elo_L287LS_elo_L1041LS_elo_L308LS_elo_L307LS_elo_L1047LS_obt_L013LS_obt_L004LS_obt_L283LS_obt_L548LS_obt_L009LS_obt_B075LS_obt_L016LS_obt_B074LS_obt_B077LS_obt_L547LS_obt_L282LS_obt_L008LS_obt_L007LS_obt_B102LS_obt_B101LS_obt_B103LS_obt_B100LS_piv_L014LS_piv_L006LS_piv_L005LS_piv_B078LS_piv_L284LS_piv_B094LS_piv_B076LS_piv_L371LS_piv_L011LS_piv_L003LS_piv_L002LS_piv_L010LS_piv_B095LS_piv_B060LS_piv_B093LS_piv_B098LS_piv_B097LS_piv_B096LS_piv_B099LS_piv_B142LS_piv_B141LS_piv_B144LS_piv_B140LS_obt_L084LS_gar_L296LS_gar_L294LS_gar_L295LS_gar_L298LS_gar_L293LS_brn_L932LS_brn_L931LS_brn_L930LS_con_L292LS_con_L291LS_con_L820LS_con_L286LS_con_L211LS_con_L210LS_rei_L355LS_rei_L338LS_rei_L341LS_rei_B073LS_rei_L342LS_rei_B072LS_rei_L353LS_rei_L343LS_rei_L339LS_rei_L779LS_rei_L778LS_rei_L777LS_rei_L773LS_muy_L915LS_muy_L913LS_muy_L914LS_mac_L891LS_mac_L212LS_mac_L225LS_mac_L178LS_mac_L085LS_mac_L083LS_mac_L890LS_mac_B082LS_mac_L290LS_mac_B088LS_mac_B087LS_mac_B089LS_mac_B086LS_mac_B061LS_tri_L955LS_tri_L618LS_tri_L690LS_tri_L519LS_tri_L621LS_tri_L182LS_tri_L180LS_tri_L179LS_muy_L907NominalBINbPTPPTPNominalBINbPTPPTPNominalBINbPTPPTPNominalBINbPTPPTP01235678

Calculating diagnostic characters¶

SPdel allows you to calculate diagnostic characters in your dataset, following BOLD classification https://v3.boldsystems.org/index.php/resources/handbook?chapter=5_analysis_extended.html%C2%A7ion=diagnostic_characters.

In [69]:
diagnostic=SPdel.diagnostics(basepath,'Nominal')
diagnostic.summary
Out[69]:
N Diagnostic Diag. or Partial Partial Part. or Uninformative Invalid
LS_brn 3 6 1 8 0 0
LS_con 6 1 0 1 1 0
LS_elo 9 3 0 4 1 0
LS_gar 5 8 3 4 2 0
LS_mac 14 3 0 6 2 0
LS_muy 4 0 0 0 1 0
LS_obt 31 0 0 0 0 0
LS_piv 23 0 0 9 1 0
LS_rei 13 3 0 3 0 0
LS_tri 8 1 0 4 1 0
In [70]:
diagnostic.plot_legend()
In [71]:
diagnostic.plot_dcs()

However the Diagnostic character analysis has only sense when you compare two or few related species, and not all your dataset (with more species there is more difficult to find diagnostic characters). Remeber that the aim of this analysis is to differentiate two or few related species that have low genetic diversity. So, we can specify the species or MOTUs to be includede as follow:

In [72]:
diagnostic=SPdel.diagnostics(basepath,'Nominal',['LS_obt','LS_piv'])
diagnostic.summary
Out[72]:
N Diagnostic Diag. or Partial Partial Part. or Uninformative Invalid
LS_obt 31 3 0 4 0 0
LS_piv 23 3 0 43 19 0
In [73]:
diagnostic.plot_dcs()

Also, as default the SPdel only included groups with at least 3 individual, you can change that!

In [74]:
diagnostic=SPdel.diagnostics(basepath,'Nominal',n_ind=8)
diagnostic.summary
Out[74]:
N Diagnostic Diag. or Partial Partial Part. or Uninformative Invalid
LS_elo 9 4 0 5 1 0
LS_mac 14 7 0 7 0 0
LS_obt 31 0 0 0 0 0
LS_piv 23 0 0 10 1 0
LS_rei 13 7 0 5 0 0
LS_tri 8 3 0 5 0 0
In [75]:
diagnostic.plot_dcs()

Finally, Remeber that you can use the Diagnostic character analysis with any analysis previously calculated

In [76]:
diagnostic=SPdel.diagnostics(basepath,'Consensus',['MOTU_17','MOTU_18'])
diagnostic.summary
Out[76]:
N Diagnostic Diag. or Partial Partial Part. or Uninformative Invalid
MOTU_18 19 4 0 1 0 0
MOTU_17 4 4 0 1 3 0
In [77]:
diagnostic.plot_dcs()

Thanks for use SPdel!